Kohonen maps on hand-written digits

Nicolas Casademont & Teo Stocco

  1. [x] load data and selected data given by name2digits function
  2. [x] Kohonen network
    • 6x6 unit distance grid,
    • gaussian neighbordhood with constant std 3
    • small constant learning rate
    • report "how you decide when your algorithm has converged"
    • visualize your prototypes and describe result
    • find a way to assign one digit to each prototype
  3. [ ] explore
    • different sizes of map (at least 3, not less than 36 units)
    • explore different width of neighborhood function
    • describe "role of witdth"
    • does the optimal width depend on map size?
    • start with large learning rate, reduce it over time, any improvements?
  4. [ ] report (max. 4 pages)
    • choice of learning rate, description on convergence detection
    • visualization, description of learnt prototypes
    • visualization, description of digit-prototype assignment
    • results of network sizes and width exploration, discussion
    • results of varying width of neighborhood over time, discussion
In [22]:
import numpy as np
import matplotlib.pyplot as plt

from helpers import name2digits

%matplotlib inline
%reload_ext autoreload
%autoreload 2

1 Setup

In [2]:
digits = name2digits('nicolas+teo')
digits
Out[2]:
array([1, 2, 6, 8], dtype=uint8)
In [3]:
labels_all = np.loadtxt('labels.txt', dtype=np.int)
labels_all.shape
Out[3]:
(5000,)
In [4]:
labels = labels_all[np.in1d(labels_all, digits)]
labels.shape
Out[4]:
(2000,)
In [5]:
data_all = np.loadtxt('data.txt', dtype=np.int)
data_all.shape
Out[5]:
(5000, 784)
In [6]:
data = data_all[np.in1d(labels_all, digits), :]
data.shape
Out[6]:
(2000, 784)

2 Kohonen network

In [7]:
def neighborhood(x, mean, std):
    """Normalized neighborhood gaussian-like with mean and std."""
    return np.exp(- np.square(x - mean) / (2 * np.square(std)))
In [8]:
def som_step(centers, datapoint, neighbor, eta, sigma):
    """Learning step self-organized map updating inplace centers.
         centers   (matrix) cluster centres (center X dimension)
         datapoint (vector)
         neighbor  (matrix) coordinates of all centers
         eta       (scalar) learning rate
         sigma     (scalar) width/variance of neighborhood function
    """    
    k = np.argmin(np.sum(np.square(centers - datapoint), axis=1))
    k_coords = np.array(np.nonzero(neighbor == k))
        
    for j in range(len(centers)):
        j_coords = np.array(np.nonzero(neighbor == j))
        disc = neighborhood(np.linalg.norm(k_coords - j_coords), 0, sigma)
        centers[j, :] += disc * eta * (datapoint - centers[j, :])
    
    return np.sum(np.square(centers - datapoint)) / len(centers)
In [9]:
# total dimension
dim = 28 * 28
# dimension support
data_range = 255
# Kohonen map border size
size_k = 6
plt.rcParams['figure.figsize'] = (size_k, size_k)
# width/variance of neighborhood function
sigma = 3.0
# learning rate
eta = 0.005
# maximal iteration count
tmax = 5000

We can check for convergence under mean square criteria. Once the algorithm does not improve this score, it has converged.

In [10]:
np.random.seed(0)

# centers randomly initialized
centers = np.random.rand(size_k ** 2, dim) * data_range

# neighborhood matrix
neighbor = np.arange(size_k ** 2).reshape((size_k, size_k))

# random order in which the datapoints should be presented
i_random = np.arange(tmax) % len(data)
np.random.shuffle(i_random)

scores = []
history = []

for t, i in enumerate(i_random):
    # at each iteration, compute the step and store the state
    score = som_step(centers, data[i, :], neighbor, eta, sigma)
    scores.append(score)

# show scores
plt.title('Scores per iteration')
plt.plot(scores)
plt.ylabel("score")
plt.xlabel("iteration")
plt.axvline(np.argmin(scores), color='red')
plt.show()

# visualize prototypes
plt.title('prototypes at best score')
for i in range(size_k ** 2):
    plt.subplot(size_k, size_k, i + 1)
    plt.imshow(centers[i,:].reshape([28, 28]), interpolation='bilinear', cmap='Greys')
    plt.axis('off')

plt.show()

We can see that each corner represents one of the four digits. In between the prototypes varies to pass to one digit to another.

In [11]:
closest_corners = []
corners = [[0, 0], [size_k - 1, 0], [0, size_k -1], [size_k, size_k]]
# for each entry, get closest corner
for e in data:
    diff = [np.sum(np.square(centers[i, :] - e)) for i in range(size_k ** 2)]
    coords = np.ravel(np.nonzero(neighbor == np.argmin(diff)))
    dists = np.linalg.norm(corners - coords, axis=1)
    closest_corners.append(np.argmin(dists))
closest_corners = np.array(closest_corners)
In [12]:
digits_assign = {}
for d in digits:
    digit_corners = closest_corners[np.where(labels == d)]
    # at least one bucket for each corner to avoid misindexing TODO explain ?
    counts = np.bincount(np.r_[digit_corners, range(4)])
    major_corner = np.argmax(counts)
    digits_assign[major_corner] = d
In [13]:
closest_proto = []
# for each entry, get closest corner
for e in data:
    diff = [np.sum(np.square(centers[i, :] - e)) for i in range(size_k ** 2)]
    coord = np.argmin(diff)
    closest_proto.append(coord)
closest_proto = np.array(closest_proto)
In [17]:
proto_assign = {}

for p in range(size_k**2):
    labels_present, counts = np.unique(labels[closest_proto == p], return_counts=True)
    proto_assign[p] = (labels_present, counts, labels_present[np.argmax(counts)])
In [18]:
plt.title('prototypes at best score, with labels')

for i in range(size_k ** 2):
    plt.subplot(size_k, size_k, i + 1)
    
    plt.title(proto_assign[i][2])
    plt.imshow(centers[i,:].reshape([28, 28]), interpolation='bilinear', cmap='Greys')
    plt.axis('off')
    
plt.show()
In [20]:
plt.title('prototypes at best score, with label confidence (%)')

for i in range(size_k ** 2):
    plt.subplot(size_k, size_k, i + 1)
    
    labels_present = proto_assign[i][0]
    counts = proto_assign[i][1]
    tot_counts = np.sum(counts)
    res = ""

    for l,c in zip(labels_present, counts):
        res += str(l) + "("
        res += str(int(c / tot_counts * 100))
        res += ") "

    plt.title(res, fontsize=5)
    plt.imshow(centers[i,:].reshape([28, 28]), interpolation='bilinear', cmap='Greys')
    plt.axis('off')
    
plt.show()
In [21]:
labels_assign = [digits_assign.get(c) for c in closest_corners]
np.count_nonzero(labels_assign != labels) / labels.shape[0]
Out[21]:
0.3065

Exploration

In [27]:
from helpers import apply_kohonen
from helpers import label_assignements
In [25]:
centers = apply_kohonen(data)
In [37]:
label_assignements(data, labels, centers, size_k, False)
In [89]:
size_k_arr = np.linspace(6, 16, 5, dtype=np.dtype('int16'))
sigma_arr = range(1,16, 2)
In [90]:
for size_k in size_k_arr:
    for sigma in sigma_arr:
        print("----------------------------------")
        print("kohnen map for size_k =", size_k, "and sigma =", sigma)
        centers = apply_kohonen(data, size_k=size_k, sigma=sigma)
        label_assignements(data, labels, centers, size_k, False)
----------------------------------
kohnen map for size_k = 6 and sigma = 1
----------------------------------
kohnen map for size_k = 6 and sigma = 3
----------------------------------
kohnen map for size_k = 6 and sigma = 5
----------------------------------
kohnen map for size_k = 6 and sigma = 7
----------------------------------
kohnen map for size_k = 6 and sigma = 9
----------------------------------
kohnen map for size_k = 6 and sigma = 11
----------------------------------
kohnen map for size_k = 6 and sigma = 13
----------------------------------
kohnen map for size_k = 6 and sigma = 15
----------------------------------
kohnen map for size_k = 9 and sigma = 1
----------------------------------
kohnen map for size_k = 9 and sigma = 3
----------------------------------
kohnen map for size_k = 9 and sigma = 5
----------------------------------
kohnen map for size_k = 9 and sigma = 7
----------------------------------
kohnen map for size_k = 9 and sigma = 9
----------------------------------
kohnen map for size_k = 9 and sigma = 11
----------------------------------
kohnen map for size_k = 9 and sigma = 13
----------------------------------
kohnen map for size_k = 9 and sigma = 15
----------------------------------
kohnen map for size_k = 13 and sigma = 1
----------------------------------
kohnen map for size_k = 13 and sigma = 3
----------------------------------
kohnen map for size_k = 13 and sigma = 5
----------------------------------
kohnen map for size_k = 13 and sigma = 7
----------------------------------
kohnen map for size_k = 13 and sigma = 9
----------------------------------
kohnen map for size_k = 13 and sigma = 11
----------------------------------
kohnen map for size_k = 13 and sigma = 13
----------------------------------
kohnen map for size_k = 13 and sigma = 15
----------------------------------
kohnen map for size_k = 16 and sigma = 1
----------------------------------
kohnen map for size_k = 16 and sigma = 3
----------------------------------
kohnen map for size_k = 16 and sigma = 5
----------------------------------
kohnen map for size_k = 16 and sigma = 7
----------------------------------
kohnen map for size_k = 16 and sigma = 9
----------------------------------
kohnen map for size_k = 16 and sigma = 11
----------------------------------
kohnen map for size_k = 16 and sigma = 13
----------------------------------
kohnen map for size_k = 16 and sigma = 15
----------------------------------
kohnen map for size_k = 20 and sigma = 1
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-90-16f00a58cf95> in <module>()
      4         print("kohnen map for size_k =", size_k, "and sigma =", sigma)
      5         centers = apply_kohonen(data, size_k=size_k, sigma=sigma)
----> 6         label_assignements(data, labels, centers, size_k, False)

/Users/nicolas/Library/Mobile Documents/com~apple~CloudDocs/Documents/EPFL/MA1/CS-434 Unsupervised and reinforcement learning in neural networks/proj1/helpers.py in label_assignements(data, labels, centers, size_k, plot_prob)
     82 
     83     for i in range(size_k ** 2):
---> 84         plt.subplot(size_k, size_k, i + 1)
     85 
     86         plt.title(proto_assign[i][2])

/usr/local/lib/python3.5/site-packages/matplotlib/pyplot.py in subplot(*args, **kwargs)
   1028 
   1029     fig = gcf()
-> 1030     a = fig.add_subplot(*args, **kwargs)
   1031     bbox = a.bbox
   1032     byebye = []

/usr/local/lib/python3.5/site-packages/matplotlib/figure.py in add_subplot(self, *args, **kwargs)
   1006 
   1007         self._axstack.add(key, a)
-> 1008         self.sca(a)
   1009         a._remove_method = self.__remove_ax
   1010         self.stale = True

/usr/local/lib/python3.5/site-packages/matplotlib/figure.py in sca(self, a)
   1361     def sca(self, a):
   1362         'Set the current axes to be a and return a'
-> 1363         self._axstack.bubble(a)
   1364         for func in self._axobservers:
   1365             func(self)

/usr/local/lib/python3.5/site-packages/matplotlib/figure.py in bubble(self, a)
    107         stack, to the top.
    108         """
--> 109         return Stack.bubble(self, self._entry_from_axes(a))
    110 
    111     def add(self, key, a):

/usr/local/lib/python3.5/site-packages/matplotlib/cbook.py in bubble(self, o)
   1400                 bubbles.append(thiso)
   1401             else:
-> 1402                 self.push(thiso)
   1403         for thiso in bubbles:
   1404             self.push(o)

/usr/local/lib/python3.5/site-packages/matplotlib/cbook.py in push(self, o)
   1365         occurring later than the current position are discarded
   1366         """
-> 1367         self._elements = self._elements[:self._pos + 1]
   1368         self._elements.append(o)
   1369         self._pos = len(self._elements) - 1

KeyboardInterrupt: 
In [ ]: